Search CORE

46 research outputs found

Bandit-Based Genetic Programming

Author: Hoock Jean-Baptiste
Teytaud Olivier
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 07/04/2010
Field of study

International audienceWe consider the validation of randomly generated patterns in a Monte-Carlo Tree Search program. Our bandit-based genetic programming (BGP) algorithm, with proved mathematical properties, outperformed a highly optimized handcrafted module of a well-known computer-Go program with several world records in the game of Go

INRIA a CCSD electronic archive server

Progress Rate in Noisy Genetic Programming for Choosing λ

Author: Hoock Jean-Baptiste
Teytaud Olivier
Publication venue: HAL CCSD
Publication date: 24/10/2011
Field of study

International audienceRecently, it has been proposed to use Bernstein races for implementing non-regression testing in noisy genetic programming. We study the population size of such a (1+λ) evolutionary algorithm applied to a noisy fitness function optimization by a progress rate analysis and experiment it on a policy search application

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

HAL-Rennes 1

On the Parallelization of Monte-Carlo planning

Author: Gelly Sylvain
Hoock Jean-Baptiste
Kalemkarian Yann
Rimmel Arpad
Teytaud Olivier
Publication venue: HAL CCSD
Publication date: 01/05/2008
Field of study

International audienceWe provide a parallelization with and without shared-memory for Bandit-Based Monte-Carlo Planning algorithms, applied to the game of Go. The resulting algorithm won the first non-blitz game against a professionnal human player in 9x9 Go

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

HAL-Polytechnique

HAL-Rennes 1

Adding expert knowledge and exploration in Monte-Carlo Tree Search

Author: Chaslot Guillaume
Fiter Christophe
Hoock Jean-Baptiste
Rimmel Arpad
Teytaud Olivier
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2009
Field of study

International audienceWe present a new exploration term, more efficient than clas- sical UCT-like exploration terms and combining efficiently expert rules, patterns extracted from datasets, All-Moves-As-First values and classi- cal online values. As this improved bandit formula does not solve several important situations (semeais, nakade) in computer Go, we present three other important improvements which are central in the recent progress of our program MoGo: { We show an expert-based improvement of Monte-Carlo simulations for nakade situations; we also emphasize some limitations of this modification. { We show a technique which preserves diversity in the Monte-Carlo simulation, which greatly improves the results in 19x19. { Whereas the UCB-based exploration term is not efficient in MoGo, we show a new exploration term which is highly efficient in MoGo. MoGo recently won a game with handicap 7 against a 9Dan Pro player, Zhou JunXun, winner of the LG Cup 2007, and a game with handicap 6 against a 1Dan pro player, Li-Chen Chien

HAL-CentraleSupelec

CiteSeerX

INRIA a CCSD electronic archive server

HAL-Polytechnique

HAL-Rennes 1

Grid coevolution for adaptive simulations; application to the building of opening books in the game of Go

Author: Audouard Pierre
Chaslot Guillaume
Hoock Jean-Baptiste
Perez J.
Rimmel Arpad
Teytaud Olivier
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2009
Field of study

International audienceThis paper presents a successful application of parallel (grid) coevolution applied to the building of an opening book (OB) in 9x9 Go. Known sayings around the game of Go are refound by the algorithm, and the resulting program was also able to credibly comment openings in professional games of 9x9 Go. Interestingly, beyond the application to the game of Go, our algorithm can be seen as a ”meta”-level for the UCT-algorithm: ”UCT applied to UCT” (instead of ”UCT applied to a random player” as usual), in order to build an OB. It is generic and could be applied as well for analyzing a given situation of a Markov Decision Process

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

HAL-Polytechnique

HAL-Rennes 1

A Principled Method for Exploiting Opening Books

Author: Gaudel Romaric
Hoock Jean-Baptiste
Pérez Julien
Sokolovska Nataliya
Teytaud Olivier
Publication venue: HAL CCSD
Publication date: 24/09/2010
Field of study

International audienceWe used in the past a lot of computational power and human expertise for having a very big dataset of good 9x9 Go games, in order to build an opening book. We improved a lot the algorithm used for gen- erating these games. Unfortunately, the results were not very robust, as (i) opening books are definitely not transitive, making the non-regression testing extremely difficult and (ii) different time settings lead to opposite conclusions, because a good opening for a game with 10s per move on a single core is very different from a good opening for a game with 30s per move on a 32-cores machine (iii) some very bad moves sometimes occur. In this paper, we formalize the optimization of an opening book as a matrix game, compute the Nash equilibrium, and conclude that a naturally randomized opening book provides optimal performance (in the sense of Nash equilibria); surprisingly, from a finite set of opening books, we can choose a distribution on these opening books so that this random solution has a significantly better performance than each of the deterministic opening book

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

Hal-Diderot

HAL-Rennes 1

Continuous Upper Con dence Trees

Author: Bonnard Nicolas
Couetoux Adrien
Hoock Jean-Baptiste
Sokolovska Nataliya
Teytaud Olivier
Publication venue: HAL CCSD
Publication date: 17/01/2011
Field of study

International audienceUpper Con dence Trees are a very e cient tool for solving Markov Decision Processes; originating in di cult games like the game of Go, it is in particular surprisingly e cient in high dimensional problems. It is known that it can be adapted to continuous domains in some cases (in particular continuous action spaces). We here present an extension of Upper Con dence Trees to continuous stochastic problems. We (i) show a deceptive problem on which the classical Upper Con dence Tree approach does not work, even with arbitrarily large computational power and with progressive widening (ii) propose an improvement, termed double-progressive widening, which takes care of the compromise between variance (we want in nitely many simulations for each action/state) and bias (we want su ciently many nodes to avoid a bias by the rst nodes) and which extends the classical progressive widening (iii) discuss its consistency and show experimentally that it performs well on the deceptive problem and on experimental benchmarks. We guess that the double-progressive widening trick can be used for other algorithms as well, as a general tool for ensuring a good bias/variance compromise in search algorithms

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

Hal-Diderot

Combiner connaissances expertes, hors-ligne, transientes et en ligne pour l'exploration Monte-Carlo

Author: Chaslot Guillaume
Chatriot Louis
Fiter Christophe
Gelly Sylvain
Hoock Jean-Baptiste
Perez J.
Rimmel Arpad
Teytaud Olivier
Publication venue: 'Lavoisier'
Publication date: 01/01/2008
Field of study

National audienceNous combinons pour de l'exploration Monte-Carlo d'arbres de l'apprentissage arti- RÉSUMÉ. ﬁciel à 4 échelles de temps : – regret en ligne, via l'utilisation d'algorithmes de bandit et d'estimateurs Monte-Carlo ; – de l'apprentissage transient, via l'utilisation d'estimateur rapide de Q-fonction (RAVE, pour Rapid Action Value Estimate) qui sont appris en ligne et utilisés pour accélérer l'explora- tion mais sont ensuite peu à peu laissés de côté à mesure que des informations plus ﬁnes sont disponibles ; – apprentissage hors-ligne, par fouille de données de jeux ; – utilisation de connaissances expertes comme information a priori. L'algorithme obtenu est plus fort que chaque élément séparément. Nous mettons en évidence par ailleurs un dilemne exploration-exploitation dans l'exploration Monte-Carlo d'arbres et obtenons une très forte amélioration par calage des paramètres correspondant. We combine for Monte-Carlo exploration machine learning at four different time ABSTRACT. scales: – online regret, through the use of bandit algorithms and Monte-Carlo estimates; – transient learning, through the use of rapid action value estimates (RAVE) which are learnt online and used for accelerating the exploration and are thereafter neglected; – ofﬂine learning, by data mining of datasets of games; – use of expert knowledge coming from the old ages as prior information

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

HAL-Polytechnique

Grid coevolution for adaptive simulations; application to the building of opening books in the game of Go

Author: Audouard Pierre
Chaslot Guillaume
Hoock Jean-Baptiste
Perez J.
Rimmel Arpad
Teytaud Olivier
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2009
Field of study

INRIA a CCSD electronic archive server